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Abstract. We study the runtime in probabilistic programs with unbounded 
recursion. As underlying formal model for such programs we use probabilis- 
tic pushdown automata (pPDA) which exactly correspond to recursive Markov 
chains. We show that every pPDA can be transformed into a stateless pPDA 
(called "pBPA") whose runtime and further properties are closely related to 
those of the original pPDA. This result substantially simplifies the analysis of 
runtime and other pPDA properties. We prove that for every pPDA the proba- 
bility of performing a long run decreases exponentially in the length of the run, 
if and only if the expected runtime in the pPDA is finite. If the expectation 
is infinite, then the probability decreases "polynomially" . We show that these 
bounds are asymptotically tight. Our tail bounds on the runtime are generic, 
£N) ■ i.e., applicable to any probabilistic program with unbounded recursion. An in- 

^ \ tuitive interpretation is that in pPDA the runtime is exponentially unlikely to 

deviate from its expected value. 

^H 

1 Introduction 

o : 

We study the termination time in programs with unbounded recursion, which are either 
randomized or operate on statistically quantified inputs. As underlying formal model 
for such programs we use probabilistic pushdown automata (pPDA) [15, 16, 7, 4] which 
are equivalent to recursive Markov chains [20, 18, 19]. Since pushdown automata are a 
standard and well-established model for programs with recursive procedure calls, our 
abstract results imply generic and tight tail bounds for termination time, the main 
performance characteristic of probabilistic recursive programs. 
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A pPDA consists of a finite set of control states, a finite stack alphabet, and a finite 
set of rules of the form pX ^-> qa, where p, q are control states, A is a stack symbol, a 
is a finite sequence of stack symbols (possibly empty), and x € (0, 1] is the (rational) 
probability of the rule. We require that for each pX, the sum of the probabilities of 
all rules of the form pX ^ qa is equal to 1. Each pPDA A induces an infinite-state 
Markov chain Ma , where the states are configurations of the form pet (p is the current 
control state and a is the current stack content), and pX (3 A- qafi is a transition of 
Ma iff pX A qa is a rule of A. We also stipulate that pe — > pe for every control state p, 
where e denotes the empty stack. For example, consider the pPDA A with two control 
states p, q, two stack symbols X, Y, and the rules 



pX 



1/4 



pe, pX 



1/4 



pXX, pX 



1/2, 



qY, pY ^ pY, qY 



1/2, 



qX, qY 



1/2, 



ge, gA ^-> gF . 



The structure of Markov chain M^ is indicated below. 




pPDA can model programs that use unbounded "stack-like" data structures such 
as stacks, counters, or even queues (in some cases, the exact ordering of items stored 
in a queue is irrelevant and the queue can be safely replaced with a stack). Transi- 
tion probabilities may reflect the random choices of the program (such as "coin flips" 
in randomized algorithms) or some statistical assumptions about the input data. In 
particular, pPDA model recursive programs. The global data of such a program are 
stored in the finite control, and the individual procedures and functions together with 
their local data correspond to the stack symbols (a function call/return is modeled 
by pushing/popping the associated stack symbol onto/from the stack). As a simple 
example, consider the recursive program Tree of Figure 1, which computes the value 
of an And/Or-tree, i.e., a tree such that (i) every node has either zero or two children, 
(ii) every inner node is either an And-node or an Or-node, and (iii) on any path from 
the root to a leaf And- and Or-nodes alternate. We further assume that the root is 
cither a leaf or an And-node. Tree starts by invoking the function And on the root of 
a given And/Or-trec. Observe that the program evaluates subtrees only if necessary. 
Now assume that the input are random And/Or trees following the Galton- Watson 
distribution: a node of the tree has two children with probability 1/2, and no children 
with probability 1/2. Furthermore, the conditional probabilities that a childless node 
evaluates to and 1 arc also both equal to 1/2. On inputs with this distribution, the 
algorithm corresponds to a pPDA Axree of Figure 1 (the control states rg and r\ model 
the return values and 1). 

We study the termination time of runs in a given pPDA A. For every pair of control 
states p, q and every stack symbol X of A, let Run(pXq) be the set of all runs (infinite 
paths) in Ma initiated in pX which visit qe. The termination time is modeled by the 
random variable T p x , which to every run w assigns either the number of steps needed 



function And (node) 
if node . leaf then 

return node. value 
else 
v := Or (node. left) 
if v — then 

return 
else 
return Or (node. right) 



function Or (node) 
if node, leaf then 

return node . value 
else 
v := And(node.left) 
if v — 1 then 

return 1 
else 
return And (node. right) 
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qA <-> ne 

,, x / 4 
q^4 <-> roe 
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To A «-» roe 
riA ■— >■ qO 
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qO ^4 rie 

1/4 

qO °-> roe 

1/2 

qO =-» qAO 
riO <-t rie 
r\)0 <-> qA 



Fig. 1. The program Tree and its pPDA model Arree- 

to reach a configuration with empty stack, or oo if there is no such configuration. The 
conditional expected value E [T p x | Run(pXq)], denoted just by E[pXq] for short, then 
corresponds to the average number of steps needed to reach qe hompX , computed only 
for those runs initiated in pX which terminate in qe. For example, using the results of 
[15, 16, 20], one can show that the functions And and Or of the program Tree terminate 
with probability one, and the expected termination times can be computed by solving 
a system of linear equations. Thus, we obtain the following: 



E[qAr ] = 7.155113 

E[qOr ] = 7.172218 

^[roAro] = 1.000000 

E[riOn] = 1.000000 



E[qA ri ] = 7.172218 
E[qOn] = 7.155113 

/±;[riAr ] = 8.172218 
E[r O ri ] = 8.172218 



E[nAri] = 8.155113 
^[roOro] = 8.155113 



However, the mere expectation of the termination time does not provide much informa- 
tion about its distribution until we analyze the associated tail bound, i.e., the probabil- 
ity that the termination time deviates from its expected value by a given amount. That 
is, we are interested in bounds for the conditional probability V(T p x > n \ Run{pXq)). 
(Note this probability makes sense regardless of whether E[pXq] is finite or infi- 
nite.) Assuming that the (conditional) expectation and variance of T p x are finite, 
one can apply Markov's and Chebyshev's inequalities and thus yield bounds of 
the form ViTpx > n \ Run(pXq)) < c/n and ViTpx > n \ Run(pXq)) < c/n 2 , respec- 
tively, where c is a constant depending only on the underlying pPDA. However, these 
bounds are asymptotically always worse than our exponential bound (see below). If 
E[pXq] is infinite, these inequalities cannot be used at all. 

Our contribution. The main contributions of this paper are the following: 



— We show that every pPDA can be effectively transformed into a stateless pPDA 
(called "pBPA" ) so that all important quantitative characteristics of runs are pre- 
served. This simple (but fundamental) observation was overlooked in previous 
works on pPDA and related models [15,16,7,4,20,18,19], although it simplifies 
virtually all of these results. Hence, we can w.l.o.g. concentrate just on the study 
of pBPA. Moreover, for the runtime analysis, the transformation yields a pBPA 
all of whose symbols terminate with probability one, which further simplifies the 
analysis. 

— We provide tail bounds for T p x which are asymptotically optimal for every pPDA 
and are applicable also in the case when E[pXq] is infinite. More precisely, we 
show that for every pair of control states p, q and every stack symbol X, there are 
essentially three possibilities: 

• There is a "small" k such that V(T p x > n \ Run(pX q)) = for all n> k. 

• E[pXq] is finite and "P(T p x > n | Run(pXq)) decreases exponentially in n. 

• E[pXq] is infinite and "P(T p x > n \ Run(pXq)) decreases "polynomially" in n. 
The exact formulation of this result, including the explanation of what is meant 
by a "polynomial" decrease, is given in Theorem 7 (technically, Theorem 7 is 
formulated for pBPA which terminate with probability one, which is no restriction 
as explained above). Observe that a direct consequence of the above theorem is 
that all conditional moments E [T pX | Run{pX q)} are simultaneously either finite 
or infinite (in particular, if E[pXq] is finite, then so is the conditional variance 
ofTpx). 

The characterization given in Theorem 7 is effective. In particular, it is decidable in 
polynomial space whether E\pXq] is finite or infinite by using the results of [15, 16, 
20], and if E[pXq] is finite, we can compute concrete bounds on the probabilities. Our 
results vastly improve on what was previously known on the termination time T p x- 
Previous work, in particular [16,3], has focused on computing expectations and vari- 
ances for a class of random variables on pPDA runs, a class that includes T p x as prime 
example. Note that our exponential bound given in Theorem 7 depends, like Markov's 
inequality, only on expectations, which can be efficiently approximated by the methods 
of [16,14]. 

An intuitive interpretation of our results is that pPDA with finite (conditional) 
expected termination time are well-behaved in the sense that the termination time is 
exponentially unlikely to deviate from its expectation. Of course, a detailed analysis of 
a concrete pPDA may lead to better bounds, but these bounds will be asymptotically 
equivalent to our generic bounds. Also note that the conditional expected termination 
time can be finite even for pPDA that do not terminate with probability one. Hence, for 
every e > we can compute a tight threshold k such that if a given pPDA terminates 
at all, it terminates after at most k steps with probability 1 — e (this is useful for 
interrupting programs that are supposed but not guaranteed to terminate). 

Proof techniques. The main mathematical tool for establishing our results on run- 
time is (basic) martingale theory and its tools such as the optional stopping theorem 
and Azuma's inequality (see Section 4) . More precisely, we construct two different mar- 
tingales corresponding to the cases when the expected termination time is finite resp. 
infinite. In combination with our reduction to pBPA this establishes a powerful link 
between pBPA, pPDA, and martingale theory. 



Our analysis of termination time in the case when the expected termination time 
is infinite builds on Perron-Frobenius theory for nonnegative matrices as well as on 
recent results from [20, 14]. We also use some of the observations presented in [15, 16, 
7]- 

Related work. The application of Azuma's inequality in the analysis of particular 
randomized algorithms is also known as the method of bounded differences; see, e.g., 
[26, 12] and the references therein. In contrast, we apply martingale methods not to 
particular algorithms, but to the pPDA model as a whole. 

Analyzing the distribution of termination time is closely related to the analysis of 
multitype branching processes (MT-BPs) [21]. A MT-BP is very much like a pBPA 
(see above). The stack symbols in pBPA correspond to species in MT-BPs. An e-rule 
corresponds to the death of an individual, whereas a rule with two or more symbols 
on the right hand side corresponds to reproduction. Since in MT-BPs the symbols 
on the right hand side of rules evolve concurrently, termination time in pBPA does 
not correspond to extinction time in MT-BPs, but to the size of the total progeny of 
an individual, i.e., the number of direct or indirect descendants of an individual. The 
distribution of the total progeny of a MT-BP has been studied mainly for the case of 
a single species, see, e.g., [21,27,28] and the references therein, but to the best of our 
knowledge, no tail bounds for MT-BPs have been given. Hence, Theorem 7 can also 
be seen as a contribution to MT-BP theory. 

Stochastic context-free grammars (SCFGs) [25] are also closely related to pBPA. 
The termination time in pBPA corresponds to the number of nodes in a derivation tree 
of a SCFG, so our analysis of pBPA immediately applies to SCFGs. Quasi-Birth-Death 
processes (QBDs) can also be seen as a special case of pPDA. A QBD is a generalization 
of a birth-death process studied in qucucing theory and applied probability (see, e.g., 
[24, 2, 17]). Intuitively, a QBD describes an unbounded queue, using a counter to count 
the number of jobs in the queue, where the queue can be in one of finitely many distinct 
"modes" . Hence, a (discrete-time) QBD can be equivalently defined by a pPDA with 
one stack symbol used to emulate the counter. These special pPDA are also known 
as probabilistic one-counter automata (pOC) [17,6,5]. Recently, it has been shown 
in [8] that every pOC induces a martingale apt for studying the properties of both 
terminating and nonterminating runs in pOC. The construction is based on ideas 
specific to pOC that are completely unrelated to the ones presented in this paper. 

Previous work on pPDA and the equivalent model of recursive Markov chains in- 
cludes [15,16,7,4,20,18,19]. In this paper we use many of the results presented in 
these papers, which is explicitly acknowledged at appropriate places. 

Organization of the paper. We present our results after some preliminaries in 
Section 2. In Section 3 we show how to transform a given pPDA into an equivalent 
pBPA, and in Section 4 we design the promised martingales and derive tight tail bounds 
for the termination time. We conclude in Section 5. Some proofs have been moved to 
Section 6. 

2 Preliminaries 

In the rest of this paper, N, No, and R denote the set of positive integers, non- negative 
integers, and real numbers, respectively. The tuples of A\ x A^ ■ ■ ■ x A n are often written 



simply as a\a 2 - ■ -a n . The set of all finite words over a given alphabet £ is denoted 
by £* , and the set of all infinite words over £ is denoted by £ u . We write e for the 
empty word. The length of a given w G £* U £ u is denoted by |io|, where the length 
of an infinite word is oo. Given a word (finite or infinite) over £, the individual letters 
of w are denoted by w(Q), to(l), • • • For Ie£ and w G X 1 *, we denote by #(X)(w) 
the number of occurrences of X in w. 

Definition 1 (Markov Chains). A Markov chain is a triple M = (S, —¥ ,Prob) 

where S is a finite or countably infinite set of states, — > C S X S is a transition 
relation, and Prob is a function which to each transition s —¥t of M assigns its proba- 
bility Prob(s — > t) > so that for every s G S we have X) s ^t Prob(s — > t) = 1 (as usual, 
we write s— >t instead of Prob(s^t) = x). 

A path in M is a finite or infinite word w G S + U 5 W such that w(i—l)—¥w(i) for 
every 1 < i < \w\. For a state s, we use FPath(s) to denote the set of all finite paths 
initiated in s. A run in M is an infinite path in M. We denote by Run[M] the set 
of all runs in M . The set of all runs that start with a given finite path w is denoted 
by Run[M](w). When M is understood, we write just Run and Run(w) instead of 
Run[M] and Run[M](w), respectively. Given s G S and ACS, we say A is reachable 
from s if there is a run w such that w(0) = s and w(i) G A for some i > 0. 

To every s £ S we associate the probability space (Run(s) , J 7 ', "P) where J 7 is the 
er-ficld generated by all basic cylinders Run{w) where u> is a finite path starting with s, 
and "P : J- — > [0, f ] is the unique probability measure such that V(Run(w)) — n\_\ Xi 
where w(i— 1) — \-w(i) for every 1 < i < \w\. If \w\ = I, we put V{Run{w)) — 1. Note 
that only certain subsets of Run(s) are P-measurable, but in this paper we only deal 
with "safe" subsets that are guaranteed to be in J ' . 

Definition 2 (probabilistic PDA). A probabilistic pushdown automaton (pPDA) 
is a tuple A = (Q, r, c -» , Prob) where Q is a finite set of control states, r is a 
finite stack alphabet, ^ C (Q x r) x (Q x .T- 2 ) is a transition relation (where 
-T- 2 = {a G .T*, |a| < 2}), and Prob is a function which to each transition pX <— ^ qa 
assigns its probability Prob(pX <-$■ qa) > so i/iai /or all p € Q and X G F we 
Ziaue f/iaf Si>x<->oa Prob(pX °— » qa) = 1. ^4s usual, we write pX^rqa instead of 
Prob(pX <—i qa) = x. 

Elements of Q x r* are called configurations of Z\. A pPDA with just one control state 
is called pBPA. 6 In what follows, configurations of pBPA are usually written without 
the (only) control state p (i.e., we write just a instead of pa). We define the size of a 
pPDA A as \A\ = \Q\ + \r\ + | ^^ | + |Pro6|, where |Pro6| is the sum of sizes of binary 
representations of values taken by Prob. To A we associate the Markov chain Ma with 
Q x r* as the set of states and transitions defined as follows: 

— pe^-pe for each p G Q; 

— pXfi^r qa/3 is a transition of Ma iff pX ^ qa is a transition of A. 

For all pXq G Q x P x Q and rF G Q x P, we define 



The "BPA" acronym stands for "Basic Process Algebra" and it is used mainly for historical 
reasons. pBPA are closely related to stochastic context-free grammars and are also called 
1-exit recursive Markov chains (see, e.g., [20]). 



— Run(pXq) = {w £ Run(pX) \ w(i) = qe for some i £ N} 

- Run(rYt) = Run(rY) \ U seQ Run(rYs). 

Further, we put [pXq] = T(Run(pXq)) and [pX\\ = V (Run(pXt)) ■ If A is a pBPA, 
we write [X] and [Af] instead of [pXp] and [pAf], where p is the only control state 
of A 

Let pa £ Q x T* . We denote by T pa a random variable over Run(pa) where T pa (w) 
is either the least n £ No such that w(n) = qe for some q £ Q, or oo if there is no 
such n. Intuitively, T pa (w) is the number of steps ( "the time" ) in which the run w 
initiated in pa terminates. We write _B[pa] := E [T pQ ] for the expected termination 
time (usually omitting the control state p for pBPA) . 



3 Transforming pPDA into pBPA 

Let A = (Q, r, c -4 , Prob) be a pPDA. We show how to construct a pBPA A, which is 
"equivalent" to A in a well-defined sense. This construction is a relatively straightfor- 
ward modification of the standard method for transforming a PDA into an equivalent 
context-free grammar (see, e.g., [22]), but has so far been overlooked in the existing 
literature on probabilistic PDA. The idea behind this method is to construct a BPA 
with stack symbols of the form (pXq) for all p,q £ Q and X £ T. Roughly speaking, 
such a triple corresponds to terminating paths from pX to qe. Subsequently, transitions 
of the BPA are induced by transitions of the PDA in a way corresponding to this intu- 
ition. For example, a transition of the form pX =-4 rYZ induces transitions of the form 
(pXq) <-*• (rYs)(sZq) for all s £ Q. Then each path from pX to qe maps naturally to a 
path from (pXq) to e. This construction can also be applied in the probabilistic setting 
by assigning probabilities to transitions so that the probability of the corresponding 
paths is preserved. We also deal with nonterminating runs by introducing new stack 
symbols of the form (pX'f). 

Formally, the stack alphabet of A, is defined as follows: For every pX £ Q x T 
such that [pX'W > we add a stack symbol (pX\) , and for every pXq £ Q x T x Q 
such that [pXq] > we add a stack symbol (pXq) . Note that the stack alphabet of A, 
is effectively constructible in polynomial space by applying the results of [15, 20]. 

Now we construct the rules ^-». of A>- For all (pXq) we have the following rules: 

— if pX ^ rYZ in A, then for all s £ Q such that y — x ■ [rYs] ■ [sZq] > we put 
(pXq)^%.(rYs)( S Zq)- 

— if pX ^->- rY in A, where y = x ■ [rYq] > 0, we put (pXq) c ^ # (rYq); 

— if pX ^> qe in A, we put (pXq) c >, e. 

For all (pX'l) we have the following rules: 

— if pX ^ rYZ in A, then for every s £ Q where y = x ■ [rYs] ■ [sZf] > we add 

( P Xt)^%.(rY S )( S Zt); ' ' 

— for all qY £ Q x T where x = [qY^] ■ J2 P x^ q Yi3 Prob(pX <->• qY j3) > 0, we add 

(pXt)^%.(qYt). ^ "' 



Note that the transition probabilities of Z\, may take irrational values. Still, the con- 
struction of Z\, is to some extent "effective" due to the following proposition: 

Proposition 3 ([15,20]). Let A = (Q,T, <-» , Prob) be apPDA. LetpXq G QxTxQ. 
There is a formula <P(x) of ExTh(M.) (the existential theory of the reals) with one free 
variable x such that the length of <P(x) is polynomial in \A\ and <P(x/r) is valid iff 
r = [pXq] . 

Using Proposition 3, one can compute formulae of ExTh(R) that "encode" transition 
probabilities of A,. Moreover, these probabilities can be effectively approximated up 
to an arbitrarily small error by employing either the decision procedure for ExTh(R) 
[10] or by using Newton's method [13, 23, 14]. 

Example 4- Consider a pPDA A with two control states, p, q, one stack symbol, X, 
and the following transition rules: 

pX <-h» qXX, pX ' ~ a > qe, qX <-> pXX, qX ' ~ > pe, 

where both a, b are greater than 1/2. Apparently, [pXp] = [gXg] = 0. Using results 
of [15] one can easily verify that [pXq] = (1 — a)/b and [qXp] = (1 — b)/a. Thus 
[p-Xj] = (a + b- l)/b and [qX"[] = (a + b - I) /a. Thus the stack symbols of A, are 
(pXq) , (qXp) , (pXX) , ((f-Xr) . The transition rules of A, are: 

(pXq) ^>. (qXp)(pXq) (pXq) A. e (qXp) «i=2»-. (pXq)(qXp) (qXp) A. e 

(pXt) ^. (qXp)(pXt) (pXt) A. (qXt) (qXt) i^. (pXq)(qXt) {qXft A. (pX^) 

As both a, b are greater than 1/2, the resulting pBPA has a tendency to remove symbols 
rather than add symbols. Thus both (pXq) and (qXp) terminate with probability 1. 

When studying long-run properties of pPDA (such as w-regular properties or limit- 
average properties) , one usually assumes that the runs are initiated in a configuration 
PqXq which cannot terminate, i.e., [po-X"oT] = 1- Under this assumption, the probabil- 
ity spaces over Run[M^](poXo) and Run[MA.]((poXot)) are "isomorphic" w.r.t. all 
properties that depend only on the control states and the top-of-the-stack symbols of 
the configurations visited along a run. This is formalized in our next proposition. 

Proposition 5. Let PqXq G Q x r such that [po-^ot] = 1- Then there is a par- 
tial function T : Run[M a](pqXq) — > Run[MA.]({poXot}) such that for every w € 
Run[M a]{pqXq) , where Y(w) is defined, and every n € N we have the following: if 
w(n) — qY/3, then T(w)(n) = {qY\)'), where \ is either an element of Q or^. Further, 
for every measurable set of runs R C Ruti[Ma.]{{poXq\)) we have that T^ 1 (R) is 
measurable and V(R) = V(Y~ 1 (R)). 

As for terminating runs, observe that the "terminating" symbols of the form (pXq) do 
not depend on the "nonterminating" symbols of the form (pX'l), i.e., if we restrict A, 
just to terminating symbols, we again obtain a pBPA. A straightforward computation 
reveals the following proposition about terminating runs that is crucial for our results 
presented in the next section. 

Proposition 6. LetpXq eQxfxQ and [pXq] > 0. Then almost all runs of Ma. 
initiated in (pXq) terminate, i.e., reach e. Further, for all n G N we have that 

V{T pX - n | Run(pXq)) = V(T {pXq) - n \ Run{(pXq)j) 



Observe that this proposition, together with a very special form of rules in A, , implies 
that all configurations reachable from a nonterminating configuration poXq have the 
form a(qY'\) 1 where a terminates almost surely and (qY'l) never terminates. It follows 
that such a pBPA can be transformed into a finite-state Markov chain (whose states 
are the nonterminating symbols) which is allowed to make recursive calls that almost 
surely terminate (using rules of the form (pX'l) ^ (rZq) (qY'l) ) . This observation is 
very useful when investigating the properties of nonterminating runs, and many of the 
existing results about pPDA can be substantially simplified using this result. 

4 Analysis of pBPA 

In this section we establish the promised tight tail bounds for the termination time. 
By virtue of Proposition 6, it suffices to analyze almost surely terminating pBPA, i.e., 
pBPA all whose stack symbols terminate with probability 1. In what follows we assume 
that A is such a pBPA, and we also fix an initial stack symbol X . For X,Y E T, we 
say that X depends directly on Y, if there is a rule X^-a such that Y occurs in a. 
Further, we say that X depends on Y , if either X depends directly on Y, or X depends 
directly on a symbol Z G r which depends on Y . One can compute, in linear time, 
the directed acyclic graph (DAG) of strongly connected components (SCCs) of the 
dependence relation. The height of this DAG, denoted by h, is defined as the longest 
distance between a top SCC and a bottom SCC plus 1 (i.e., h = 1 if there is only 
one SCC). We can safely assume that all symbols on which Xq does not depend were 
removed from A. We abbreviate V(Tx > n \ Run(X j) to V(Tx >n), an d we use 
Pmin to denote min{p | X Q a in A}. Here is our main result: 

Theorem 7. Let A be an almost surely terminating pBPA with stack alphabet r . As- 
sume that Xq G r depends on all X G r\ {Ao}, and let p m i n = min{p | X^>a in A}. 
Then one of the following is true: 

(lj?(T Xo >2 |r| )=0. 

(2) E[Xq] is finite and for all n G N with n > 2E[Xq] we have that 

P n mm < V(T Xo >n) < expfl 



mm — ' V-"--*o — "V — ^^ki- 1 - gjj2 

maxxer -K[A]. 
(3) E[Xq\ is infinite and there is no G N such that for all n > no we have that 

c/n 1 / 2 < V{T Xo >n) < di/n d2 

where di = 18h\r\/p^, and d 2 = l/(2 h+1 - 2). Here, h is the height of the DAG 
of SCCs of the dependence relation, and c is a suitable positive constant depending 
on A. 

More colloquially, Theorem 7 states that A satisfies either (1) or (2) or (3), where 
(1) is when A does not have any long terminating runs; and (2) resp. (3) is when the 
expected termination time is finite (resp. infinite) and the probability of performing a 
terminating run of length n decreases exponentially (resp. polynomially) in n. 

One can effectively distinguish between the three cases set out in Theorem 7. More 
precisely, case (1) can be recognized in polynomial time by looking only at the structure 



of the pBPA, i.e., disregarding the probabilities. Determining whether E[X n ] is finite or 
infinite can be done in polynomial space by employing the results of [16, 3]. This holds 
even if the transition probabilities of A are represented just symbolically by formulae 
of ExTh(R) (see Proposition 3). 

The proof of Theorem 7 is based on designing suitable martingales that are used 
to analyze the concentration of the termination time. Recall that a martingale is an 
infinite sequence of random variables m^ , m^ 1 ' , . . . such that, for all i € N, E [\m^ \] < 
oo, and E[to^ +1 ) | to^, . . . ,mW] = m^ almost surely. If \m^ — m( ,_1 )| < a for all 
i£N, then we have the following Azuma's inequality (see, e.g., [29]): 



V(m 



(n) 



»<°> > t) 



< 



exp 



2EL1 



We split the proof of Theorem 7 into four propositions (namely Propositions 8-11 
below), which together imply Theorem 7. 

The following proposition establishes the lower bound from Theorem 7 (2): 

Proposition 8. Let A be an almost surely terminating pBPA with stack alphabet r. 
Let Pmin = mm{p \ X ^ a in A}. Assume that 'P(Tx >2' ') > 0. Then we have 

P n mln < V(T Xo >n) forallneN. 

Proof. Let Tx ( w ) > n f° r some neN and some w G Run(Xo). It follows from the 
definition of the probability space of a pPDA that the set of all runs starting with 
w(0), w(l), . . . , w(n) has a probability of at least p^in- Therefore, in order to complete 
the proof, it suffices to show that "P(T Xo >2 |r| ) > implies V(T Xo >n) > for all 
neN. 

To this end, we use a form of the pumping lemma for context-free languages. Notice 
that a pBPA can be regarded as a context-free grammar with probabilities (a stochastic 
context-free grammar) with an empty set of terminal symbols and r as the set of 
nonterminal symbols. Each finite run w € Run(Xo) corresponds to a derivation tree 
with root Xq that derives the word e. The termination time Tx is the number of 
(internal) nodes in the tree. In the rest of the proof we use this correspondence. 

Let V(T x „>2 lrl ) > 0. Then there is a run w e Run(X Q ) with T Xo (w) > 2l r L This 
run w corresponds to a derivation tree with at least 2' r ' (internal) nodes. In this tree 
there is a path from the root (labeled with Xq) to a leaf such that on this path there 
are two different nodes, both labeled with the same symbol. Let us call those nodes n\ 
and ri2, where n\ is the node closer to the root. By replacing the subtree rooted at n-i 
with the subtree rooted at m we obtain a larger derivation tree. This completes the 
proof. □ 

The following proposition establishes the upper bound of Theorem 7 (2): 

Proposition 9. Let A be an almost surely terminating pBPA with stack alphabet F . 
Assume that Xq depends on all X G r \ {Xq}. Define 



E, 



„ ulx := max £LY] 
xer 



and 



B 



max 



l-E[X]+J2#(Y)(a)-E[Y} 



Yer 



Then for all n G N with n > 2_E[X ] we have 

«,™ x 2MX ]-n 

P(T Xo >n) < exp^-^ < exp 1- 
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Proof. Let w € Ruji(Xq). Wc denote by I(w) the maximal number j > such that 
w(j — 1) ^ £. Given i > 0, we define rrv l '{w) :— E[w{i)] + min{z, I(w)}. We prove that 
E(mS t+1 > | m'*') = m*- 4 -*, i.e., rrv a \m- \ . . . forms a martingale. It has been shown 
in [16] that 

E[X] = Y x + Yl x-(l + E[Y])+ Y x ■ (1 + E[Y] + E[Z}) 

X4 £ X^Y X^-YZ 

= 1+ Y x-E[Y}+ Y x ■ [E[Y] + E[Z]) . 



X^Y X^YZ 



On the other hand, let us fix a path u € FPath(Xo) of length i and let w be an 
arbitrary run of Run(u). First assume that u(i — 1) = la <G rr* . Then we have: 

E |m (i+1) | flim(u) 

= Y x ■ (m {i) (w) - E[X] + 1) + Y x ' ( mW H - E[X] + E[Y] + 1)+ 
x4 £ xAy 



^ x • (m (l) (w) - £LY] + E[Y] + E[Z] + 1) 

%YZ 

(w)-E[X] + l+ Y x-E[Y]+ Y x-(E[Y]+E[Z] 



X^YZ 






= m w (w) 

If u(i — 1) = e, then for every w> S Run(u) we have m( 4+1 )(ui) = I{w) = m^\w). This 
proves that m^\rrv- l \ ... is a martingale. 
By Azuma's inequality (see [29]), we have 



V(m^-E[X }>n-E[X }) < exp ( j^M ) < 



exp 



2ELi fl2 / " V 2fi2 

For every i« € Run(Xo) we have that w(n) 7^ e implies m'™' > n. It follows: 
^(Tx„>n) < V{rnF>>n) < e xp ( 2E[ ^ n ) < exp (l - " 



where the final inequality follows from the inequality B < 2E max . □ 

The following proposition establishes the upper bound of Theorem 7 (3): 

Proposition 10. Let A be an almost surely terminating pBPA with stack alphabet r. 
Assume that Xq depends on all X E T \ {Xq}. Let p mm = min{p | X ^> a in A}. Let 
h denote the height of the DA G of SCCs. Then there is no £ N such that 

V(T Xo >n) < ^'ffflff for all n> n . 



Proof (sketch; a full proof is given in Section 6.2). Assume that £[X ] is infinite. To 
give some idea of the (quite involved) prooL let us first consider a simple pBPA A with 
r = {X} and the rules X ^ XX and X <-*• e. In fact, A is closely related to a simple 
random walk starting at 1, for which the time until it hits can be exactly analyzed 
(see, e.g., [29]). Clearly, we have h = \r\ = 1 and p m i n = 1/2. Theorem 7(3) implies 
V(Tx> n ) £ 0{\/y/n). Let us sketch why this upper bound holds. 

Let 9 > 0, define g{6) := \ ■ exp(-# • (-1)) + \ ■ cxp(-# • (+1)), and define for a 
run w £ Run(X) the sequence 



rrig (w) 



cxp(-9-\w(i)\)/g(0y 
nig (w) 



if i = or w(i — I) =/= e 
otherwise. 



One can show (cf. [29]) that m ,m g , . . . is a martingale, i.e. 



(«) i (*-i) 

771 o m a 



,(i-l) 



for all > 0. Our proof crucially depends on some analytic properties of the 
function g : R — >• M: It is easy to verify that 1 = g(Q) < g(6) for all $ > 0, and 
= .g'(0), and 1 = g"(0). One can show that Doob's Optional-Stopping Theorem (see 

Theorem 10.10 (ii) of [29]) applies, which implies m> g ' = E m 

all n £ N and 6 > we have that 



(Tx) 



It follows that for 



exp(- 



(o) 



E 



,(Tx) 



E[5((?)- Tx ] = £p(T x =z).. 9 (0)- (1) 



7 = 
71—1 OO 

< J2 V ( T * = *) • ! + E p ( T ^ = *) • ffW~ n 

«— i— n 

= 1 - -P(T X > n) + V(T X > n) ■ g{6)- n 



l-cxp(-fl) 



from which one obtains, 



Rearranging this inequality yields V(Tx > n) < 1 _ , g - 

setting 9 := 1/y/n, and using the mentioned properties of g and several applications 

of l'Hopital's rule, that V{T X > n) £ 0{l/^/n). 

Next we sketch how we generalize this proof to pBPA that consist of only one SCC, 
but have more than one stack symbol. In this case, the term |u>(z)| in the definition 
of nig (w) needs to be replaced by the sum of weights of the symbols in w(i). Each 
Y £ r has a weight which is drawn from the dominant eigenvector of a certain matrix, 
which is characteristic for A. Perron-Frobenius theory guarantees the existence of a 
suitable weight vector u £ W+ . The function g consequently needs to be replaced by a 
function gy for each Y £ T. We need to keep the property that g Y {ty > 0- Intuitively, 
this means that A must have, for each Y £ r, a rule Y ^ a such that Y and a have 
different weights. This can be accomplished by transforming A into a certain normal 
form. 

Finally, we sketch how the proof is generalized to pBPA with more than one SCC. 
For simplicity, assume that A has only two stack symbols, say X and Y, where X 
depends on Y, but Y docs not depend on X. Let us change the execution order of 
pBPA as follows: whenever a rule with a £ r* on the right hand side fires, then all 
A-symbols in a are added on top of the stack, but all Y"-symbols are added at the 
bottom of the stack. This change does not influence the termination time of pBPA, but 



it allows to decompose runs into two phases: an A-phase where A-rules are executed 
which may produce y-symbols or further A-symbols; and a K-phase where Y-rules are 
executed which may produce further y-symbols but no A-symbols, because Y does 
not depend on A. Arguing only qualitatively, assume that Tx is "large". Then either 
(a) the A-phase is "long" or (b) the A-phase is "short" , but the F-phase is "long" . For 
the probability of event (a) one can give an upper bound using the bound for one SCC, 
because the produced F-symbols can be ignored. For event (b), observe that if the 
A-phase is short, then only few y-symbols can be created during the A-phase. For a 
bound on the probability of event (b) we need a bound on the probability that a pBPA 
with one SCC and a "short" initial configuration takes a "long" time to terminate. The 
previously sketched proof for an initial configuration with a single stack symbol can 
be suitably generalized to handle other "short" configurations. All details are given in 
Section 6.2. □ 

The following proposition establishes the lower bound of Theorem 7 (3): 

Proposition 11. Let A be an almost surely terminating pBPA with stack alphabet T. 
Assume that Xq depends on all X G T \ {Xq}. Assume E\Xq\ = oo. Then there is 
c > such that 

-7= < V{T Xo >n) forallneN. 

The proof of Proposition 11 follows the lines of the previous proof sketch, but with an 
additional trick: To obtain the desired bound, one needs to take the derivative with 
respect to 9 on both sides of Equation (1). The full proof is given in Section 6.3. 

Tightness of the bounds in the case of infinite expectation. If E[Xq] is infinite, 
the lower and upper bounds of Theorem 7 (3) asymptotically coincide in the "strongly 
connected" case (i.e., where h = 1 holds for the height of the DAG of the SCCs of 
the dependence relation). In other words, in the strongly connected case we must have 
"P(T > n) € 0(l/y/n). Otherwise (i.e., for larger h) the upper bound in Theorem 7 (3) 
cannot be substantially tightened. This follows from the following proposition: 

Proposition 12. Let Ah be the pBPA with Th = {X\, . . . ,Xh} and the following 
rules: 

1/2 1/2 1/2 1/2 1/2 1/2 

X-h e > X-hXh , Xh e > Xh-i , • • • , X2 c > A2A2 , X2 e > X\ , X\ c > X\X\ , X\ e > 

Then [X^] = 1, SfX/J = 00, and there is c^ > with 

-^7T < V(T Xh >n) for all n e N. 
Proposition 12 is proved in Section 6.4. 

5 Conclusions and Future Work 

We have provided a reduction from stateful to stateless pPDA which gives new insights 
into the theory of pPDA and at the same time simplifies it substantially. We have 
used this reduction and martingale theory to exhibit a dichotomy result that precisely 
characterizes the distribution of the termination time in terms of its expected value. 



Although the bounds presented in this paper are asymptotically optimal, there 
is still space for improvements. We conjecture that our results can be extended to 
more general reward-based models, where each configuration is assigned a nonnegative 
reward and the total reward accumulated in a given service is considered instead of 
its length. This is particularly challenging if the rewards are unbounded (for example, 
the reward assigned to a given configuration may correspond to the total memory 
allocated by the procedures in the current call stack). Full answers to these questions 
would generalize some of the existing deep results about simpler models, and probably 
reveal an even richer underlying theory of pPDA which is still undiscovered. 

6 Proofs 

In this section we give the missing proofs for the stated results. Some additional nota- 
tion is used in the proofs. 

— Given two sets K C S* and LCJ'U £" , we use K ■ L (or just KL) to denote 
the concatenation of K and L, i.e., KL — {ww' | w £ K,w' £ L}. 

— For a run w and i £ N, we write Wi to denote the run w(i) w(i+l) .... 

6.1 Proofs of Propositions 5 and 6 

Proposition 5. Let pqXq £ Q x L such that [po-X"ot] = 1- Then there is a par- 
tial function T : Run[M a](pqXq) — »■ Run[M ^ 2 ]{{pqXq\)) such that for every w £ 
Run[M a]{pqXq) , where T(w) is defined, and every n £ N we have the following: if 
w(n) = qY/3, then T(w)(n) = {qY])^, where f is either an element of Q or^. Further, 
for every measurable set of runs R C Run[M a 2 ](.{p®XqX)) we have that T~ 1 (R) is 
measurable and V{R) = V(T^ 1 (R)). 

Proof. Let w £ Run[M /\\{pqXq) . We define an infinite sequence w over F* inductively 
as follows: 

— w(0) = (poXot) 

— If w(i) = e (which intuitively means that an "error" was indicated while defining 
the first i symbols of w), then w(i+l) = e. Now let us assume that w{i) = (pX\)a, 
where f £ Q U {f}, and w(i) = pX-y for some j £ T* . Let pX =-» r/3 be the rule of 
A used to derive the transition w(i) — > w(i+l). Then 

a if f3 = e and f = r; 

(rY])a if /3 = Y and [rYf] > 0; 

(rY s)(sZ])a if f3 = Y Z, \sZ]\ > 0, and there is k > i such that w(k) = sZj and 
w(i+l) = < 

\w(j)\ > \w(i)\ for all i < j < k; 

(rFt)Q! if 13 = YZ, [rYj] > 0, and \w(j)\ > \w(i)\ for all j > i; 

e otherwise. 



We say that w G Run[M^](p X ) is valid if w(i) ^ e for all % € N. One can easily 
check that if w is valid, then w is a run of A initiated in (poAot)- We put T(w) = w 
for all valid w G Run[M a]{poXq) . For invalid runs, T stays undefined. 

It follows directly from the definition of w that for every valid w G Run[M a](pqXq) 
and every ieNwe have that if w(i) — qY (3 then w(i) = (qY^-f, where f G Q U {t}. 

Now we check that for every measurable set of runs R C i?Mn[M^]((p ^ot)) we 
have that T _1 (.R) is measurable and 7 , (i?) = T , (T^ l (R)). First, realize that the set 
of all invalid w G Run[M a](poXo) is measurable and its probability is zero. Hence, it 
suffices to show that for every finite path v in M A initiated in (po-Xot) we have that 
T- 1 {Run[M A ]{v)) is measurable and V {T^ 1 {Run[M A ]{v))) = V {Run[M A ]{v)) . For 
simplicity, we write just T _1 (w) instead of T^ 1 (Run[M A ](v)). 

Observe that every configuration 7 reachable from (po-^ot) m -^zi i s °f the form 
7 = (piXip 2 ) • • • (p k X k p k+1 )(p k+1 Y^) where fc > 0. We put 

P[ 7 ] - [piX lP2 ] ■ ■ ■ [ Pk X kPk+1 ] • [p fc+ irt] 

Further, we say that a configuration pa of A is compatible with 7 if p = pi and 
a = Xi • • • X k Yf3 for some [3 G -T* . A run w initiated in such a compatible configuration 
P1X1 ' ' ' X k Y (3 models 7, written w |= 7, if w is of the form 

Pl X 1 ---X k Y/3 ^* p 2 X 2 ---X k Y/3 ^* ••• ^* Pk+1 Y/3 -> ••• 

where for all 1 < i < fc, the stack length of all intermediate configurations visited along 
the subpath p,Xj • • • X k Y/3 — > *p i+ iX i+ i ■ ■ ■ X k Y(3 is at least |X, • • • X k Y/3\. Further, 
the stack length in all configurations visited after q k Y/3 is at least \Y/3\. A straightfor- 
ward induction on fc reveals that 

V {w G Run(p 1 X 1 ■ ■ ■ X k Y(3) \ w \= 7} = P[ 7 ] (2) 

Let wa, where a e T*, be a finite path in M A initiated in (poAof), and let £(va) be the 
set of all finite path vA in Ma initiated in poAo such that A G Q x _T*, \vA\ = |t>a|, and 
T (va) contains a run that starts with vA. One can easily check that if vA G £ (vol), 
then A is compatible with a. Further, 

Y-^va) = (J d0{u6 flwn[M 4 ](A) | w h "} (3) 

From (3) we obtain that T~ l (voc) is measurable, and by combining (2) and (3) we 
obtain 

Vir^iva)) = P[a] ■ J2 V{Run{vA)) (4) 

vA(E£(va) 

Now we show that V{Y (va)) = V (Run(va)) . We proceed by induction on \va\. The 
base case when vol = (po-Xot) is immediate. Now suppose that vol — u[3a, where [3 — > a. 



By applying (3) and (4) we obtain 

■piY- 1 (u0a)) = V I (J uBA&{w£Run(A)\w\=a} 

\uBAGS(u0a) 

= P\ (J uB0 |J {w e Run(BA) | uBie£(^ci),M) H /W h«} 
\uBe£(u0) AeQxr* 

Yl V{Run(uB))-r[ (J {we i?wn(BA) | uBie£(^a),«) hA™i h"} 
uBe£(ap) \AgQxT' 

=* Y^ V(Run(uB))-P[(3}-x 

uBe£(uf3) 

= x-V{T- l {uP)) 

= P(Run(u/3a)) 

The (*) equality is proved by case analysis (we distinguish possible forms of the rule 
which generates the transition j3 — >a). □ 

Proposition 6. Let pXq £ Q x r x Q and [pXq] > 0. Then almost all runs of Ma. 
initiated in (pXq) terminate, i.e., reach e. Further, for all n £ N we have that 

P{T pX = n | Run( P Xq)) = V{T (pXq) = n \ Run{{pXq))) 



Proof. For every n £ N we define 

PJpXqin) := P(Run(pXq), T pX = n | Run(pX)) 

D ( P Xq){n) := P(T {pXq ) = n | Run((pXq))) 

We prove the following: 

D pXq (n) = [pXq] ■ D {pXq) (n) . (5) 

Notice that (5) implies P(T pX — n \ Run(pXq)) = P{T( pXq ) = n \ Run((pXq))) 7 as 
P(T pX = n\Run(pXq)) = D pXq (n)/\pXq\. 

To prove (5), we proceed by induction on n. First, assume that n = 1. If pX ^-> ge, 
then (pXq) ^ e, where y = , x , and thus 

D pXq (l) = X = Jg* - [pXq]y = [pXq]D (pXq) (\) . 
If there is no rule pX ^4 qe in Z\, then there is no rule (pXq) <-» e in Z\.. 



Assume that n > 1. Let us first prove that D p x q (n) can be decomposed according 
to the first step: 



D P Xq{n) = ^2 x " D rYq(n - 1) + y^ ^ y^a;--D r y 8 (i)--D 8 z g (n-z-l) (6) 



To prove (6) we introduce some notation. For every rYs G Q x _T x Q and i £ N we 
denote by B r Ys{i) the set of all paths from rY to se of length i. We also denote by 
B r Ys{i) \_Z the set of all paths of the form poaoZ ■ ■ ■ piOiiZ where pocto ■ ■ • Pi&i belongs 
to B r Y S {i)- We have 



B P x q (n)= (J B rYs {n - 1) U (J |J \J {pX} ■ B rYs {i)[Z ■ B sZq (n - i - 1) 

pX^rY i=l px^yz s£Q 



where all the unions are disjoint. Now the probability of following a path of B rYs (i) \_Z 
is equal to the probability of following a path of B r Y S {i), which is D r Y. s (i)- Thus we 
have that 



V(Run{{pX} ■ B rYs (i)[Z ■ B sZq (n - i - 1))) = x ■ V(B rYs (i)[Z ■ Run(B sZq (n - i - 1))) 

= x ■ T(Run(B rY s(i)) \Z) ■ V(Run(B sZq (n - i - 1))) 
= x ■ V(Run(B rYs (i))) • D sZq (n - i - 1) 
= x ■ D r Ys(i) ■ D sZq {n — i — 1). 

It follows that 

D p x q {n) = V(Run(B pXq (n))) 

( „_i \ 

= V{Run (J B rYs (n- 1) U |J |J |J { P X} ■ B rYs (i)[Z ■ B sZq (n - i - 1) ) 

\pX^vY i=l pX^rYZ se ® J 

= Y^ x-V(Run(B rYs {n-l))) + 

pX^rY 
n-1 

+ J2 Y, Yx-V{Run{B rY sii)))-V{Run{B sZq {n-i-l))) 

1=1 pX^rYZ sG< 3 

n-1 

= ^ x- D rYq (n- 1) + ^] y^ ^x- D rYs {i) ■ D sZq (n-i- 1) , 



which proves (6). Now we are ready to finish the induction proof of (5). 

n-l 
D p Xq(n) = ^2 X ■ DrYqjn - 1) + ^ ^ ^ X ■ DrYsji) • DsZqjn — i — 1) 
pX^rY l=1 pX^rYZ s£< 3 

- J2 x-D (rYq) (n-l)-[rYq} + 

<^>rY 
n-l 

+ 5Z E E x ' £> <^«> (*) " \ rYs \ ' D {szq) ( n ~ i ~ 1 ) ' l sZ l\ 

'^ IQ 

.i:\rYq] 



pX^,rY 
n-l 



\pxAi-y 



D( rYq ){n- 1) + 



+ E E E^'^w-^f—" 

= b x ?] • E y ' ^V*^) ( n _ 1 )+ 

\{ P Xq)^(rYq) 
n-l 

+ E E yD {rYs ){i)-D {sZq) (n-i-l) 

1 = 1 ( P Xq)^(rYs)(sZq) 

= \pXq] ■ D {pXq ) (n) 

Finally, observe that X) n =i ^(pXq) is the probability of reaching e from (pXq) and 
that 

□ 

6.2 Proof of Proposition 10 

In this subsection we prove Proposition 10. Given a finite set r, we regard the elements 
of R r as vectors. Given two vectors u, v e M r , we define a scalar product by setting 
wv := Y^xer U (X) ■ v(X). Further, elements of M. rxF are regarded as matrices, with 
the usual matrix- vector multiplication. 

It will be convenient for the proof to measure the termination time of pBPA starting 
in an arbitrary initial configuration ao £ rr*, not just with a single initial symbol 
Xq € r. To this end we generalize Tx , Run(Xo), etc. to T Qo , Run(ao), etc. in the 
straightforward way. 

It will also be convenient to allow "pBPA" that have transition rules with more than 
two stack symbols on the right-hand side. We call them relaxed pBPA. All concepts 
associated to a pBPA, e.g., the induced Markov chain, termination time, etc., are 
defined analogously for relaxed pBPA. 

A relaxed pBPA is called strongly connected, if the DAG of the dependence relation 
on its stack alphabet consists of a single SCC. 



For any a £ T* , define #(a) as the Parikh image of a, i.e., the vector of N r such 
that #(a)(Y) is the number of occurrences of Y in a. Given a relaxed pBPA A, let 

4,4 £ M rxr be the matrix with 



A A (X,Y)= J2 V#{a){Y). 



X'- 



We drop the subscript of Aa if A is clear from the context. Intuitively, A(X, Y) is the 
expected number of ^-symbols pushed.pn the stack when executing a rule with X on 
the left hand side. For instance, if X <->■ XX and X <-* e, then A(X, X) = 2/5. Note 
that A is nonnegative. The matrix A plays a crucial role in the analysis of pPDA and 
related models (see e.g. [20]) and in the theory of branching processes [21]. We have 
the following lemma: 



Lemma 13. Let A be an almost surely terminating, strongly connected pBPA. Then 
there is a positive vector u £ M.+ such that Au < u, where < is meant componentwise. 

All such vectors u satisfy """" > p min , where p m in denotes the least rule probability 
in A, and u m in and u max denote the least and the greatest component ofu, respectively. 

Proof. Let X, Y £ r. Since A is strongly connected, there is a sequence X = 
Xi,X2, ■ ■ ■ ,X n = Y with n > 1 such that Xi depends directly on Xj + i for all 
1 < i < n — 1. A straightforward induction on n shows that A n (X,Y) ^ 0; i.e., A 
is irreducible. The assumption that A is almost surely terminating implies that the 
spectral radius of A is less than or equal to one, see, e.g., Section 8.1 of [20]. Perron- 
Frobenius theory (see, e.g., [1]) then implies that there is a positive vector u £ K+ 
such that A ■ u < u; e.g., one can take for u the dominant eigenvector of A. 

Let A ■ u < u. It remains to show that ^ mm - > p m \ n . The proof is essentially given 
in [14], we repeat it for convenience. W.l.o.g. let r = {X\, . . . , X|p|}. We write Ui for 
u(Xi). W.l.o.g. let Mi = u max and u\r\ = u m i n . Since A is strongly connected, there 
is a sequence 1 = n, ri, ■ ■ ■ ,r q = |.T| with q < \r\ such that X r , depends on X r . +1 for 
all j. We have 



u ]r \ 



Uy 



Umax U\ U Tql U ri 

By the pigeonhole principle there is j with 2 < j < q such that 

q-l , \\r\ 

Um.i.rr. - I Uc \ j U 



Umax \ Ut J \ Ut 



> I — ) where s :— rj and t := Tj-\. (7) 



We have A ■ u < u, which implies A(X S , X t ) ■ u t < u s and so A(X S , X t ) < u s /u t . On 
the other hand, since X s depends on X t , we clearly have p m in < A(X S , X t ). Combining 
those inequalities with (7) yields ^^ > (A(X Sl X t )) ln > p^ n . D 



Given a relaxed pBPA A and vector u £ M.+ , we say that A is u-progressive, if A 
has, for all X £ r, a rule X ^-> a such that \u(X) — #(a) »u\ > u m i n /2. The following 
lemma states that, intuitively, any pBPA can be transformed into a it-progressive 
relaxed pBPA that is at least as fast but no more than \T\ times faster. 



Lemma 14. Let A be an almost surely terminating pBPA with stack alphabet r. Let 
Pmin denote the least rule probability in A, and let u G M.+ with A a u <u. Then one 
can construct a u-progressive, almost surely terminating relaxed pBPA A 1 with stack 
alphabet r such that for all a$ G r* and for all a > 

V'(T ao > o) < V(T ao > o) < V'(T ao > a/\r\) , 

where V and V are the probability measures associated with A and A' , respectively. 

I r\ 
Furthermore, the least rule probability in A' is at least p min , and A^> u < u. Finally, 

if A a ■ u = u, then A A i ■ u = u. 

Proof. A sequence of transitions Xi <— >■ ai, . . . , X n <-} a n is called derivation sequence 
from Xi to a n , if for all i G {2, ...,n} the symbol Xi G -T occurs in a$_i. The 
word induced by a derivation sequence X\ '—ton,..., X n <— » a n is obtained by taking 
a\, replacing an occurrence of Xi by a 2 , then replacing an occurrence of A3 by a^, 
etc., and finally replacing an occurrence of X n by a n . 

Given a pBPA A and a derivation sequence s = 

(X i nalX 2 al,X2^a2,...,X n ^a n ) with X t ^ X 3 for all 1 < % < j < n, 
we define the contraction Con(s) of s, a set of X\ -transitions with possibly more than 
two symbols on the right hand side. The contraction Con(s) will include a rule X\ '—$■ 7, 
where 7 is the word induced by s. We define Con(s) inductively over the length n of s. 
If n = 1, then C'on(s) = {X t h a\X 2 a\}. If n > 2, let s' = (X 2 ^ a 2 ,. . . ,X n H a n ) 
and define 

S 2 := {X 2 <-> (3 I X 2 <-> (3 is a rule in Z\} - { X 2 ^ a 2 1 U CW(s') ; (8) 



i.e., 5 2 is the set of ^-transitions in A with X2^4a2 replaced by Con(s'). W.l.o.g. 
assume S 2 = {X 2 $k- ft, . . . , X 2 ^ (3k}- Then we define 

n f \ (v piq } la 2 v P 1 * lj 2I 

Con(s) := <X\ <-*• a 1 (3\a l , . . . , Ai <— *■ o^p/jO^ > . 

The following properties are easy to show by induction on n: 

(a) C'on(s) contains X\ '—^7, where 7 is the word induced by s. 

(b) The rule probabilities are at least p" lin . 

(c) Let A' be the relaxed pBPA obtained from A by replacing X\Q a\X 2 a\ with 
Con(s). Then each path in M.a> corresponds in a straightforward way to a path 
in M4, namely to the path obtained by "re-expanding" the contractions. The 
corresponding path in M A has the same probability and is not shorter but at most 
1^1 times longer than the one in M A >- 

(d) Let A' be as in (c). Then A A > • u <u. Let us prove that explicitly. The induction 
hypothesis n — 1 is trivial. For the induction step, using the definition for 5 2 
in (8) and S 2 = {X 2 &h j3\, . . . , X 2 ^ (3k}, we know by the induction hypothesis 
that X)i=i 1i ' #(A)" M < u(X 2 ). This implies 

k 

/ PiQi ' 4({ a iPi a i) ' u < Pi ' (({ a iX 2 ai) 'u , and hence 

i=l 

(A A , ■ u) (Xr) < (A A ■ u) (Xi) < u(Jfi) . 
Since A A and A^/ may differ only in the Xi-row, we have A A > ■ u < u. 



(e) Let A' be as in (c) and (d). If A^ u = u, then A^ 1 u = u. This follows as in (d), 
with the inequality signs replaced by equality. 

Associate to each symbol Xi€fa shortest derivation sequence 

c(Xi) = {Xi <->• ai,...,X„_i <->• a„_i,X„ <->• e) 

from Xi to £. Since Z\ is almost surely terminating, the length of c(Xi) is at most \T\ for 
all Xi £ r. Let X\ £ r, and let 71 denote the word induced by c(Xi), and let 72 denote 
the word induced by the derivation sequence 02(^1) := \X\ c -> a\, . . . , X n _\ <^-> a„_i). 
We have #(72) •« = #(71) •« + «(A„) > #(71) •« + u m „, so we can choose 7 € 
{71,72} such that lit(-X'i) — #(7)*u| > u min /2. Choose c(X\) £ {c(X\), c 2 (Ai)} such 
that c(Xi) induces 7. (Of course, if C2(X\) has length zero, take c(X\) = c(X\).) Note 
that (I1H7) e Con(c(-Xi)). 

The relaxed pBPA Z\' from the statement of the lemma is obtained by replacing, 
for all X\ £ r, the first rule of c(X\) with Con(c(X\)). The properties (a)-(e) from 
above imply: 

(a) The relaxed pBPA A' is it-progressive. 

1 pi 

(b) The rule probabilities are at least p m \ n - 

(c) For each finite path w' in M^ from some ao £ r* to e there is a finite path w 
in Ma from «o to e such that \w'\ < \w\ < \r\ ■ \w'\ and V'{w') = V(w). Hence, 
V'(T ao < a/\r\) < V{T ao < a) < V'{T ao < a) holds for all a > 0, which implies 
P'(T Q0 > a) < V{T ao >a)< 7"(T Qo > a/\r\). 

(d) We have A^> • u < u. 

(e) If Aa ■ u = u, then Aa> ■ u = u. 

This completes the proof of the lemma. □ 

Proposition 15. Let A be an almost surely terminating relaxed pBPA with stack al- 
phabet r . Let u £ M^ be such that u max — 1 and A& ■ u < u and A is u-progressive. 
Let Pmin denote the least rule probability in A. Let C := 17\r\/(p m i n •u^ nin ). Then for 
each k £ No there is no £ N such that 

V( r T aa >n 2k+2 /{2\r\)) < C'/n for all n > n and for all a £ F* with 1 < |a | < n k . 

Proof. For each lefwe define a function gx ■ R — > M. by setting 
gx(0):= Y, P^M-0-(-u(X) + #(a)-u)). 

P 

The following lemma states important properties of gx ■ 
Lemma 16. The following holds for all X £ T: 

(a) For all 9 > we have 1 = gx(0) < 9x(d). 

(b) For alld>0 we have < g' x (0) < g' x {6). 

(c) For all 9 > we have < g'x{®)- ^ n particular, g' x (0) > Pmin ■ u min/^- 

Proof (Proof of the lemma). 



(a) Clearly, gx(0) = 1. The inequality gx{0) < gx(@) follows from (b). 

(b) We have: 

9x{0) = ]T V ■ cxpM ' (-MX) + #(a) •«)) 

P 

X^a 

9xW) = E P ■ ( U W - #(«)•«) ' cx P(-° ■ (-MX) + #(a) •«)) 

Let A(X) denote the X-row of A, i.e., the vector v e M. r such that v(Y) = A(X, Y). 
Then A ■ u < u implies 

9x(0)= E P ■(«(*) -#(<*)•«) 

P 

X^a 

= u(X)- E p-#(a)'u = u(X)-A(X)'u 
>u(X)-u(X) = 0. 

The inequality g' x (0) < 9x(@) follows from (c). 

(c) We have 



fJx 



x(o)= J2 p- («(*) - #(°) ' u ) 2 ■ c ^(-° ■ (- u ( x ) + #(«)*«)) > ° ■ 



x<- 



Since Z\ is it-progressive, there is a rule X ^ a with Iw(-X') — #(a)»«| > u m i n /2. 
Hence, for = we have g x (0) > p min ■ u 2 min /^. 

This proves the lemma. D 

Let in the following 9 > 0. Given a run w £ Run(ao) and i > 0, we write XW(w) 
for the symbol X e f for which w(i) = Xa. Dchne 

i-i x 
,, ,exp(-0-#(w(i))'u)-T[ j-- if i = or «)(« - 1) ^e 

TO e ( w ) otherwise 



Lemma 17. rrig ,m,g , . . . is a martingale. 

Proof (Proof of the lemma). Let us fix a path v £ FPath(ao) of length i > 1 and let 
w be an arbitrary run of Run{v). First assume that y(i — 1) = Xa £ iT 1 *. Then we 



have: 



E 



,(<) 



Run(v) 



E 



i-l 






Run(v) 



=0 foO)(io)(^) 



= J] p • cxp (-0 • (#(«,(* - 1)) -l x + #(a)) -«) • J] ^7n 

= ^ p .exp(-e.(#(«;(*-l)).«-tt(A-) + #(a).«))-n^^7 

V 

i-l 

= cxp (-9 ■ #{w{i - 1)) •«) • V p ■ exp (-0 • (-ti(Jf) + #(a) •«)) • TT ttk 

i-i x 

= cxp (-9 ■ #(w(i - !)).«) • g xi i- 1Hw) (0) ■ J] — 



=o 9xU)(w)(0) 



3=0 



i-2 



:xp (-0 • #(«,(i - l))-«) • J] 1 —77K 



= cxp 



If v(i — 1) = s, then for every w G Run(v) we have trig (w) = trig \w). Hence, 
rrig , rrig , . . . is a martingale. D 



Since 9 > and since <7xoo («,)(#) > 1 by Lemma 16(a), we have < m 6 (w) < 1, so 
the martingale is bounded. Since, furthermore, T Qo (we write only T in the following) 
is finite with probability 1, it follows using Doob's Optional-Stopping Theorem (see 



Theorem 10.10 (ii) of [29]) that m, 

exp(-0 • u max ■ n k ) 
< exp(-6> • u«#(a )) = mf > 

~ J T )1 



(0) 



E 



,(T) 



Hence we have for each n£N: 



= E 



E 



(by optional-stopping) 



E 



<E 



T-l 

cxp(-e . o) • n — m 



T-l x 



jx(«r 



(for some lef) 



oo .. 

= £>(T=»).— -^ 

[„ 2fc + 2 /(2|r|)]-i 
< y, ^(T=*)-l 

+ £ P(T = i) 

»=[ n 2*+2/(2|r|)i 

= 1--P(T > n 2fc+2 /(2|r|)) 
+ -P(T > n 2fe+2 /(2|r|)) 



(Lemma 16 (a)) 



ffx (6i)« 2fc+2 /(2|r|) 



j x (0)» !l+! /(VI) 

Rearranging the inequality, we obtain 

1 - cxp(~6> ■ u max ■ n k ) 
1 -.gx(^)"" 2fc+2/(2|r|) 



V(T > n Ak+A /(2\r\)) < 



(9) 



For the following we set 6 = n ( fe+1 ). We want to give an upper bound for the right 
hand side of (9). To this end we will show: 



lim 



(l - exp(-n ( fe+1 ) • ■ 



n )) • n 



< 



1 



n^ l- 5x (n-( fe +i))-™ 2(fc+1) /(2|^l) ^ l- C xp(-p min -u 2 mm /(16\r\)) 
Combining (9) with (10), we obtain 

1 



(10) 



limsup n ■ V(T > n 2k+2 /(2|r|)) < 



< 



1 - CXp {-Pmin ■ U^ in /(16|r|)) 

1 
i-(i-if.(p min .< in /(i6|r|))) 
i7\r\/( Pmm ■ u\ 



I 2 ■ ) 

nun/ i 



which implies the proposition. 

To prove (10), we compute limits for the nominator and the denominator separately. 
For the nominator, we use l'Hopital's rule to obtain: 

1 - exp(-w TOax • n^ 1 ) -u max ■ rT 2 ■ exp(-tz TOOX • rT x ) 
hm = = Inn ^ 



n—¥oo 



For the denominator of (10) we consider first the following limit: 
lim ^■n 2(k+1) -^9x(n- {k+l y) 

n^oo Z\l | 

1 ln ffx (n-( fe+1 )) 

hm 



2|r| n-K» n- 2 ( fe+1 ) 

1 ,. ffV(n-( fe+1 ))-(-(fc + l))-n- fe - 2 nm . „ , , 

2M ^So /xlU +1 )).(-2( fc+ l)).^-3 ( 1,H °P ltal ' S rule ) 

l «V(n _(fe+1) ) 

lim ZTrrn — (by Lemma 16 (a)) . 



4|T| n-K» n -( fc+1 ) 

If g' x (0) > 0, then the limit is +oo. Otherwise, by Lemma 16 (b), we have g' x {0) = 
and hence 

_ 1 &^!1±^1^1 ( ,H„ pito l- s ,.„le) 



4|r| «^oo (_(fc + i)). n - 

1 T 5x(0) > Pmin ■ u^„/(16|r|) (by Lemma 16 (c)) . 



4|r| 

This proves (10) and thus completes the proof of Proposition 15. □ 

The following lemma serves as induction base for the proof of Proposition 10. 

Lemma 18. Let A be an almost surely terminating pBPA with stack alphabet r . As- 
sume that all SCCs of A are bottom SCCs. Let p m i n denote the least rule probability 
in A. Let D := 17|r|/p^. Then for each k e N there is n g N such that 

'P(T ao >n 2k+2 /2) < D/n for all n > n and for all a g T* with I < \a \ < n k . 

Proof. Decompose r into its SCCs, say r = A U • • • U r s , and let the pBPA Z\j be 
obtained by restricting A to the IVsymbols. For each i g {1, . . . , s}, Lemma 13 gives a 
vector Ui g K,* . W.l.o.g. we can assume for each i that the largest component of Ui is 
equal to 1, because Ui can be multiplied with any positive scalar without changing the 
properties guaranteed by Lemma 13. If the vectors Ui are assembled (in the obvious 

way) to the vector u g M^_, the assertions of Lemma 13 carry over; i.e., we have 

i p\ 
Aa-u < u and u max = 1 and u min > p min - Let A' be the M-progressive relaxed pBPA 

from Lemma 14, and denote by V' and p' min its associated probability measure and 

least rule probability, respectively. Then we have: 

T(T ao >n 2k+2 /2) < T'(T ao > n 2k+2 / (2\r\)) (by Lemma 14) 

< 17\r\/(p' min ■ u 2 mm ■ n) (by Proposition 15) 

< I7\r\/(p' mm ■ p 2 J2 ■ n) (as argued above) 

< 17|r|/(p 3 ifJ • n) (by Lemma 14) . 



□ 

Now we are ready to prove Proposition 10, which is restated here. 

Proposition 10. Let A be an almost surely terminating pBPA with stack alphabet T. 
Assume that X depends on all X £ T \ {A }. Let p min — min{p | X ^> a in A}. Let 
h denote the height of the DA G of SCCs. Then there is uq £ N such that 

V(T Xa >n) < ^j^f foralln>n . 

Proof. Let D be the D from Lemma 18. We show by induction on h: 

n h+i o, hD 
V(T Xo >n 2 ~ 2 ) < for almost all n £ N. (11) 

Note that (11) implies the proposition. The case h = 1 (induction base) is implied 
by Lemma 18. Let h > 2. Partition T into Thigh U ri ot u such that ri ow contains the 
variables of the SCCs of depth h in the DAG of SCCs, and Thigh contains the other 
variables (in "higher" SCCs). IfXo € -How; then we can restrict A to the variables that 
are in the same SCC as Xq, and Lemma 18 implies (11). So we can assume X £ Thigh- 
Assume for a moment that V(Tx >n 2 ~ 2 ) holds for a run w £ Run(X ); i.e., we 
have: 



77. 



2 h+l_ 2 



< \{i £ No | w(i) £ TT*}\ 



- |{» £ No | w(i) £ r high r*}\ + \{i £ No | w(i) £ T low T*}\ . 
It follows that one of the following events is true for w: 

(a) At least n 2 ~ 2 steps in w have a Thi g h-symbo\ on top of the stack. More formally, 

|{«GNo \w{i)£T Mgh r*}\>n 2h - 2 . 

(b) Event (a) is not true, but at least n 2 ~ 2 — n 2 ~ 2 steps in w have a ri ot „-symbol 
on top of the stack. More formally, 

\{i £ No | w(i) £ r high r*}\ < n 2h - 2 and 
\{i £ No | w(i) £ T low T*}\ > n 2h+1 - 2 - n 2 ^ 2 . 

In order to give bounds on the probabilities of events (a) and (b), it is convenient to 
"reshuffle" the execution order of runs in the following way: Whenever a rule X <—i a 
is executed, we do not replace the A-symbol on top of the stack by a, but instead 
we push only the -T^/i-symbols in a on top of the stack, whereas the /low-symbols 
in a are added to the bottom of the stack. Since A is a pBPA and thus does not have 
control states, the reshuffling of the execution order does not influence the distribution 
of the termination time. The advantage of this execution order is that each run can be 
decomposed into two phases: 

(1) In the first phase, the symbol on the top of the stack is always a / n ^ a ^-symbol. 
When rules are executed, 7^ OUJ -symbols may be produced, which are added to the 
bottom of the stack. 



(2) In the second phase, the stack consists of -T; ott , -symbols exclusively. Notice that by 
definition of il otu , no new /h^-symbols can be produced. 

In terms of those phases, the above events (a) and (b) can be reformulated as follows: 

(a) The first phase of w consists of at least n 2 ~ 2 steps. The probability of this event 
is equal to 

VA hvh (T Xo >n 2h - 2 ), 

where A^igh is the pBPA obtained from A by deleting all ri OJi ,-symbols from the 
right hand sides of the rules and deleting all rules with ri ow -symbols on the left 
hand side, and V A ht h is its associated probability measure. 

(b) The first phase of w consists of fewer than n 2 ~ 2 steps (which implies that at 
most n 2 ~ 2 _T; ow -symbols are produced during the first phase), and the second 
phase consists of at least n 2 ~ 2 — n 2 ~ 2 steps. Therefore, the probability of the 
event (b) is at most 

max{^ im „(T Q0 > n 2h+1 - 2 - n 2 *- 2 ) | a € /?„„,, 1 < \a \ < n 2 ^ 2 } , 



where Ai ow is the pBPA A restricted to the ^otu-symbols, and V a 1ow is its asso- 
ciated probability measure. Notice that n 2 ~ 2 — n 2 ~ 2 > n 2 ~ 2 /2 for large 
enough n. Furthermore, by the definition of -T; t«, the SCCs of A\ ow are all bottom 
SCCs. Hence, by Lemma 18, the above maximum is at most D/n. 

Summing up, we have for almost all n£N: 

V(T Xo >n 2h+1 - 2 ) < -P(cvent (a)) + P(evcnt (b)) 

< Va^J^Xo > n 2 _2 ) + D/n (as argued above) 

(h-l)D D hD «,.,., , . n 

< 1 = (by the induction hypothesis). 

n n n 

This completes the induction proof. □ 

6.3 Proof of Proposition 11 

The proof of Proposition 11 is similar to the proof of Proposition 10 from the previous 
subsection. Here is a restatement of Proposition 11. 

Proposition 11. Let A be an almost surely terminating pBPA with stack alphabet r . 

Assume that X depends on all X G r \ {X }. Assume E[X ] = oo. Then there is 

c > such that 

-7= < V{T Xo >n) forallneN. 

Jn 



Proof. For a square matrix M denote by p(M) the spectral radius of M, i.e., the 
greatest absolute value of its eigenvectors. Let Aa be the matrix from the previous 
subsection. We claim: 

P (A A ) = 1. (12) 



The assumption that A is almost surely terminating implies that p(A^) < 1, see, 
e.g., Section 8.1 of [20]. Assume for a contradiction that p(A^) < 1- Using standard 
theory of nonnegative matrices (see, e.g., [1]), this implies that the matrix inverse 
B := (I — A&)~ 1 (here, / denotes the identity matrix) exists; i.e., B is finite in all 
components. It is shown in [16] that E[Xq] = (B ■ 1)(Xq) (here, 1 denotes the vector 
with 1(A) = 1 for all X). This is a contradiction to our assumption that -E[Ao] = oo. 
Hence, (12) is proved. 

It follows from (12) and standard theory of nonnegative matrices [1] that Aa has a 
principal submatrix, say A', which is irreducible and satisfies p(A') = 1. Let r' be the 
subset of r such that A' is obtained from A by deleting all rows and columns which 
arc not indexed by i~". Let A' be the pBPA with stack alphabet r' such that A' is 
obtained from A by removing all rules with symbols from r \ r' on the left hand side 
and removing all symbols from r \ J" from all right hand sides. Clearly, A& = A' , 
so p{Aa>) = 1 and A&i is irreducible. Since A' is a sub-pBPA of A and Xq depends 
on all symbols in 1"", it suffices to prove the proposition for A' and an arbitrary start 
symbol Ijef. 

Therefore, w.l.o.g. we can assume in the following that A/\ = A is irreducible. Then 
it follows, using (12) and Perron- Frobenius theory [1], that there is a positive vector 
u £ R+ such that A ■ u = u. W.l.o.g. we assume u(Xq) = 1. Using Lemma 14 we can 
assume w.l.o.g. that A is M-progressive. (The pBPA A may be relaxed.) 

As in the proof of Proposition 15, for each lefwc define a function g x ■ K — *■ K 
by setting 

gx(9) := J2p- cx p(^ ' (~ M ( X ) + #(«)'«)) ■ 

P 

The following lemma states some properties of gx ■ 
Lemma 19. The following holds for all X E T: 

(a) For all 9 > we have 1 = gx(0) < 9x(9)- 

(b) For all9>0 we have = g' x (0) < g' x (9). 

(c) For all9>0 we have < g x {9). 

(d) There is c^ > such that for all < 9 < 1 we have g'x(9) < cyfi. 

(e) There is C3 > 1 such that for all n G N we have gx(l/\/n) n > C3. 
/^f ) There is ca > swc/i f/iaf for all n £ N we have - — 7^ — ,, , ,—. < ca. 

u/ J l-l/9x(l/V") — 

Proof (of the lemma). The proof of items (a)-(c) follows exactly the proof of Lemma 16 
and is therefore omitted. (For the equality = g'xify m (b) one uses A ■ u = u.) 

(d) It suffices to prove that q' x {9)/9 is bounded for 9 — > 0. Using l'Hopital's rule we 
have lim e ^o 9 X (9)/ 9 = g x (0) > 0. 

(e) Clearly, we have gxO-/ ' \fn) n > 1 for all n. Furthermore, we have: 

lim \ng x {i./y/n) n = hm lngx ^ ^ 

n— >oo n— >oo 1/71 

1 a' (n- 1 / 2 ) 

= - lim yx _ U2 ' (l'Hopital's rule) 



2 n— >oo n !' 2 

<&(0) 



2 
> (by (c)) 



(l'Hopital's rule) 



Hence the claim follows, 
(f) The claim follows again from l'Hopital's rule: 



lim 



1/n 



= lim 



-1/n 2 



n-voo 1 - 1/gxin- 1 / 2 ) n-*x> (\/g x {n' 1 /' 2 ))' 2 . g ^( n -l/2) . (_l/2) n - 3 / 2 



lim 



2n-V2 



This completes the proof of the lemma. 



< oo 



n 



Let in the following 9 > 0. As in the proof of Proposition 15, given a run w € 
Run(Xo) and i > 0, we write X^-'(u') for the symbol X <G P for which w(i) = Xa. 
Define 



i-l 



:,: , Jexp(-(9-#(«;(i))"u)- TT if i = or w(i - 1) ^ e 



m^ (w) 



otherwise 



As in Lemma 17, one can show that the sequence m e ,m g , . . . is a martingale. As 
in the proof of Proposition 15, Doob's Optional-Stopping Theorem implies exp(— 0) = 

m g = E m e x " . Hence we have for each n € N (writing T for Tx )'- 



cxp(-6>) = E 



,(T) 



E 



T-l 

cxpm o) n — — 

fJ- 9xu) (0) 



T-i 1 



(by optional-stopping) 



Taking, on both sides, the derivative with respect to yields 

9iM 



exp(-6>) <^i-P(T 



50,9(0) 



i+l 



(13) 



where go. 6 — <?x and g\$ = gy for some X, Y £ P possibly depending on 0. The 
following lemma bounds an "upper" subseries of the right-hand-side of (13). 

Lemma 20. For all e > there is a <= N such that for all n E N and 9 = 1/y/n we 
have 

gUO) 



E *-^ T 



i—an-\-l 



50,9(0) 



i+l 



"■ £ . 



Proof (of the lemma). By rearranging the series we get for all n G N and 9 = l/y/n: 

i—an-\-l ' 

^ P(T > an) ■ g^(fl) ~ 7?(T>i)-^ |fl (0) 



< an ■ 7?(T > an) ■ <^ fl (0) | " P(T > i) ■ g' lfi {6) 
9o,e(d) an £> g ,e(dy 



"■* =: 9 2 



We bound q\ and q2 separately. By Proposition 10 there is c\ > such that V(T > 
k) < Ci/vfe. Hence we have, using Lemma 19 (d), (e): 

y/afi- Ci • C 2 /\/™ ^ ClC2y/tt , ... . 

9i < ^ < ^ — j and similarly, 

c 3 c 3 

oo ., 

< Cl C 2 \p 1 

C\C2 



Va~-n-g AO) an -(l-l/goAQ)) 
<^1 (by Lemma 19 (e), (f)) . 

V a ' c 3 

These bounds on gi and g 2 can be made arbitrarily small by choosing a large enough. 
This completes the proof of the lemma. □ 

This lemma implies a first lower bound on the distribution of T : 
Lemma 21. For any c > there is s € N such that for all n £N we have: 



^i-V(T =i)> c\/n. 



Proof (of the lemma). Let a G N be the number from Lemma 20 for e = exp(— 1)/2. 
For all n € N and = \l Jn we have: 



an 

> exp(— 0) — £ (by (13) and Lemma 20) 

> exp(— 1) — e = e (by the choice of e), 



so, with Lemma 19 (d) we have for all n e N: 

an 

VJi-;p(T = i)> —Vn. 
For the given number c > 0, choose s :— a\cc2/e~\ 2 . Then it follows for all m £ N: 

srn 

VJi-p(T =i)> cVm, 

which proves the lemma. D 

Now we can complete the proof of the proposition. By Proposition 10 there is c\ > 
such that "P(T > n) < Ci/\/n for all n E N. By Lemma 21, there is s <G N such that 

sn 

VJi.p(T = i) > (2ci +2)V^ forallneN. 

i=l 

We have for all n e N: 

sn sn n 

YJi • P(T = i) > YJi • P(T = i) - YJi • P(T = i) 

i—n i—1 i—1 

n 

> (2ci + 2) y/n - ^2 ^( T > *) ( b y thc choice of s above) 



i=Q 



> (2ci + 2)\/n — 1 — >_, —r (by the choice of c\ above) 



i=\ 



> (2ci + l)y/n- / 
Jo 

= (2ci + l)v/n-2ci\/n 






It follows: 



snP(T > n) > snVJp(T = i) > VJi-p(T = i) 

i—n i—n 

> \fn (by the computation above) 

Hence we have 

V{T > n) > ^= , 

which completes the proof of the proposition. □ 



6.4 Proof of Proposition 12 

Here is a restatement of Proposition 12. 

Proposition 12. Let Ah be the pBPA with J), = {X\, . . . ,Xh} and the following 
rules: 

1/2 1/2 1/2 1/2 1/2 1/2 

Xh e — > XhXh , Xh e — > Xh-i , • • • , ^"2 c — > X^X-i , X2 ' — > X\ , X\ <■ — > X\X\ , Xi c — > e 
Then [Xh] = 1, E[Xh] = oo, and t/iere is c^ > wii/i 

-^T < V{T Xh >n) forallneN. 

Proof. Observe that the third statement implies the second statement, since 

00 00 00 

E[X h ] = Yl V ^x h >n) >Y, c h- n ~ 1/2 ' 1 > Yl Ch l n = °° ■ 

n— 1 n— 1 n— 1 

We proceed by induction on h. Let h = 1. The pBPA Z\i is equivalent to a random 
walk on {0, 1,2,.. .}, started at 1, with an absorbing barrier at 0. It is well-known (see, 
e.g., [11]) that the probability that the random walk finally reaches is 1, but that 
there is c\ > such that the probability that the random has not reached after n 
steps is at least ci/^/n. Hence [X{\ = 1 and 7 , (Tx 1 >«) > ci/^/n = ci • n -1 / 2 . 

Let h > 1. The behavior of Ah can be described in terms of a random walk Wh 
whose states correspond to the number of X/j-symbols in the stack. Whenever an Xh- 
symbol is on top of the stack, the total number of X^-symbols in the stack increases 
by 1 with probability 1/2, or decreases by 1 with probability 1/2, very much like the 

random walk equivalent to A\. In the second case (i.e., the rule Xh ^ Xh-\ is taken), 
the random walk Wh resumes only after a run of A^-i (started with a single Xh-i- 
symbol) has terminated. By the induction hypothesis, [X/j_i] = 1, so with probability 1 
all spawned "sub-runs" of Ah-i terminate. Since Wh also terminates with probability 1, 
it follows [Xh] = 1. 

It remains to show that there is Ch > with ViTx^n) > Ch ■ n~ x / 2 for all n > 1. 
Consider, for any n > 1 and any t > 0, the event Ag that Wh needs at least i steps to 
terminate (not counting the steps of the spawned sub-runs) and that at least one of 
the spawned sub-runs needs at least n steps to terminate. Clearly, Tx h (w) > n holds 
for all w E Ae, so it suffices to find c/, > so that for all n > 1 there is £ > with 
V{A() > Ch ■ n^ 1 / 2 . At least half of the steps of Wh are steps down, so whenever Wh 
needs at least 21 steps to terminate, it spawns at least i sub-runs. It follows: 

V(A e ) > V{W h needs at least 2£ steps) • (l - (V{T Xh _ 1 < n)) 



> —== ■ ( 1 — ( 1 — Ch-i ■ n 1 ' 2 ) ) (by induction hypothesis) 



Now we fix £ :— n 1 / 2 . Then the second factor of the product above converges to 
1 — e~ Ch_1 for n — > 00, so for large enough n 



Hence, we can choose c/j < ^ • (1 — e Ch -i) such that V{At) > c^-n 1 / 2 holds for all 
n > 1. " D 
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